The Compilation Process

Compilation involves five main steps

Interpreters use the first three steps to find the next instruction

Lexical Analysis

Lexical analysis is also known as scanning
The source code is just a text file containing a long string of characters

Example

total = (a + b) * vat;
The scanner will split this into a list of tokens based on whitespace and punctuation

IDENTIFIER total
EQUALS
OPEN_PAREN
IDENTIFIER a
PLUS
IDENTIFIER b
CLOSE_PAREN
MULTIPLY
IDENTIFIER vat
SEMICOLON

The source file will be turned into a long list of tokens that are passed to the next step

Syntax Analysis

Syntax analysis is also known as parsing
The list of tokens is checked to see if it forms a valid program

The AST arranges tokens into a structure that represents different parts of the code

Example
int modulo(int x, int y) {
	int result = (x % y);
	return result
}

![[Pasted image 20250519093813.png]]
This is the modulo function's AST

Semantic Analysis

Semantic analysis adds meaning to the program

Code Generation

Code generation turns the AST into actual machine code instructions

![[Pasted image 20250519094431.png]]
Here's the generated assembly code for the modulo function

Code Optimisation

During the optimisation step the code is analysed to see if there are ways to

There are lots of optimisation techniques

Optimisation can happen in multiple places during the compilation process

Code Optimisation Techniques

Loop Invariant Code

Consider this fragment of code:

for(int i = 0; i <= n; i++) {
	foo = amp + 5;
	sum = sum + (foo * i);
}

The calculation of foo happens each time round the loop

foo = amp + 5;
for(int i=0; i<=n; i++) {
	sum = sum + (foo*i);
}

Strength Reduction

Consider this fragment of code:

for(int i = 0; i<=n; i++) {
	sum = sum + i;
}

The loop can be removed completely and replaced with a simple calculation
sum = n * (1+n)) /2;
This does exactly the same thing (sums all integers from 1 to n)

Code optimisation can make the harder to understand

Linking Phase and Symbol Resolution

A large program will be split across multiple source files